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(54) A system for locating automatically video segment boundaries and for extraction of 
key-frames 



(57) The present invention describes an automatic 
video content parser for parsing video shots such that 
they are represented in their native media and retrieva- 
ble based on their visual contents. This system provides 
methods tor temporal segmentation of video sequences 
into individual camera shots using a novel twin^mpar- 
Ison method. The method is capable of detecting both 
camera shots implemented by sharp break and gradual 



transitions implemented by special editing techniques, 
including dissolve, wipe, fade-in and fade-out; and con- 
tent based keyframe selection of individual shots by an- 
alysing the temporal variation of video content and se- 
lect a Key frame once the difference of content between 
the current frame and a preceding selected key frame 
exceeds a set of preselected thresholds. 
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Description 

The present invention relates to video indexing, archiving, editing and production, and. more particularly, this inven- 

*" ESK ^^S^TTJ^^ right q-c* * - essentia, a^ctwe aj 

faceG^^ 

to our advantage is what this present invention is addressing. Information about many major aspects o the worid^, 
miy Les. L on* be successful managed when presented in a time-varying manner such as^de . source^ 
Ser.theeflecfiveusec^ 

organization and retrieval of information from these sources. Also, the tlme^pendent ^ !«? S 

diTult medium tomanage. Much of the vast quantity of video containing ^le infomiato,^ 
is because indexing requires an operator to view the entire video package andtoass.gn mdex means manually to each 
of teSSTSIuaJ. this eppToach is not feasible considering the abundance of unindexed vdeos and *e lack of 
sutS^npower and time. Moreover, without an index, information retrieval from vdeo requires an oporto v ^w 

remeval techniques based on text. Tnerefore.ther e fec. S arlyane^ 
way as a book with index structure and a table of content Pnor art teaches segmentation t * 
camerabreaks. but no method deteda gradual transitions Implemented by s P^f ^ dlt, "9 ,e ^ 1 ^.^ l f 
ifade4nandfade<>u».Pric,artsu^ 

P £c 2nd working Conf. on Visual Oatabased Systems, Budapest. 1991 , pp. 119-133 by A Nagasaka ^Y Tan^a 
anT-vweo HandL Based on Structured information for Hypermedia Systems.- Prcc. nti cent on M^rnedia ntor- 
maion Sterns Singapore. 1991. pp. 333-344 by Y. Tonomura teach segment boundaries detection methods but only 
Se^llSngS 
SwfasKe^ 

29 "* fiXSSH object of the present mention to automate temporal f^^^^l^ZZ 
ounces into individual camera shots by distinguishing between sharp breaks and gradual transits ^e™** g 
special effects. Such partition is the first step in video content indexing which is currently being earned out manually by 
an ooerator a time consuming tedious and unreliable process. 

so Tr^r object of thlpresent invention to provide content based key frame select, on of manual shots for 
rsDresentfno. indexing and retrieval of the shots in a multimedia manner. 

Sdfngly the present invention describes an automatic video content parser tor pars.ng «deo shots such that 

tney^rS^^ 
Xmpora^ 

ss camera shots implemented by sharp break and gradual 

techniques, includingdissolve. wipe. fade-in and fade^ut The system f^f^^^l^^^^ 
of individual shots by analysing me temporal variation of vktoc^snd^a^^^ST'* 
content between the currentframa and a preceding selected key frame exceeds a set 6 [^^^^ eren0B 
For a better understanding of the present invention and to Show how the same may be earned into effect reference 
40 win now be made by way of example to the accompanying drawings in whfctv- 
Fl<3. 1 A shows a flow chart of a known video segmentation method. 

FIG. iBshowsanexampleofatypicalsharpcutinavideosequence. l „ IMtlm 

R6 2 is a block diagram showing an overview of a video content parsing system of the present 

FIG. SAehows afiowchartforperiormingtemporal segmentation capable of both detecting sharp cuts and gradual 

43 transitions implemented by special effects of the present invention. 

FIG 3B shows anexampie of a typical dissolve transition found in a video sequence. . 

FIG. 3C shows an example of the frame-to-frame difference values wilh high peaks corresponding to sharp cute 
and a sequence of medium peaks corresponding to a typical dissolve sequence. 

na 4 shows a flow chart for extracting key frame of Represent invention. ^„ r * ; „ n te 

so pG. 5 shows a flow chart of a system for automatically segmenting video and extracting key frame according to 

,hS V Su?2^to=lSi are used in the description of this present invention. Accordingly, for better understanding of 
the invention, definition of some of these terminologies are given as follows. 

A 'segment- or a W is a single, uninterrupted camera shot, consisting of one or more frames 
H A-transrtion-cccursvvhenonemcrves^^ 

as a sharp break (occurs between two frames belonging to two different shots), or aa special effects such aa dissolve, 
wipe, fade-in and fade-out in which case, transition occurs across more than one frames. tDryirW , ltu 

temporal segmentation- or 'scene change detection of a video is to partition a given video sequence temporaiiy 
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into individual segments by finding the boundary positions between successive shots. 

■Key frames" are those video frames which are attract ed from a video segment to represent the content of the video 
or segment in video database. The number of key frames extracted from a video sequence is always smaller than the 
number of frames in the video sequence. Thus, a sequence of key frames is considered an abstraction of the video 

6 sequence from which they are extracted. 

Threshold' is a limiter used as a reference for checking to see whether a certain property has satisfied a certain 
criteria which is dictated by this limiter to define a boundary of a property. The value of threshold t. used in par-wise 
comparison for judging whether a pixel or super-piXel haschanged across successive frames \s determ.ned &W n ™"~ 
tally and itdoes not changesignificantly or different video sources. However, experimentshave shown that thethreshoids 

10 7Vand T„ used for determining a segment boundary using any of the difference metric (as defined below) vanes from 
one video source to another. 

■Difference metrics" are rrcthcmatical equations, or modifications thereof, adapted for anafysmg the properties of, 
in this case, video content. The different difference metrics includes; 

ie . Pair-wise pixel comparison, whereby a pixel Is judged as changed if the dif erence between the intensity values in 
the two frames exceeds a given threshold t This metric may be represented as a binary function DPf.k.1) over the 
domain of two-dimensional coordinates of pixels, (fcfl. where the subscript i denotes the Index of the fiame being 
compared with its successor. If P^ denotes the intensity value of the pixel at coordinate [k,I) in frame i. then DP f 
may be defined as: 
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0 otherwise 



The pair-wise comparison algorithm simply counts the number of pixel© changed from one frame to the next 
according to the above metric. A segment boundary is declared if more than a given percentage of the total number 
of pixels (given as a threshold 7) have changed. Since the total number of pixels in a frame of dimension M by N 
is WN, this condition may be represented by the following inequality: 

ite! i *ioo>r 

Likelihood ratio is a comparison of corresponding regions or blocks in two successive frames based on the sec- 
ond-order statistical characteristics of their intensity values. Let m, and m^denote the mean intensity values for a 
given region in two consecutive frames, and let S, and S M denote the corresponding variances and the ratio is 
defined as: 



M 



A segment boundary is declared whenever the total number ot sample areas whose likelihood ratio exceeds 
the threshold t is sufficiently large (where "sufficiently large - depends on how the frame is partitioned). This metric 
is more immune to small and slow object motion moving from frame to frame than the preceding difference metrics 
and therefore less likery to be misinterpreted as camera breaks. 

Histogram comparison is yet another algorithm that is less sensitive to object motion, since it ignores the spatial 
changes in a frame. Let Hfj) denotes the histogram value for the flh frame, where /is one of the G possible pixel 
values. The overall difference SD- t is given by: 

A segment boundary is declared once SD f is greater than a given threshold 7. 
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• x 2 - test fe a modified version of above immediate equation which makes the histogram comparison reflect the 
difference between two frames more strongly. 

These above metrics may be implemented with different modifications to accommodate the idiosyncrasies of dif- 
ferent video sources. Unlike prior art, the video segmentation of the present invention does not limit itself to any particular 

io difference metric and a single specified threshold. As such it is versatile. 

FIG 1 A shows a flow chart of known video segmentation method. This algorithm detects sharp cuts in video se- 
quences This is achieved by first, after initializing the standard system parameters at block 1 01 , a first reference frame 
is selected at block 102. the difference, D h between frames F}and F M (which is a skip factor S away) is calculated 
based on a selected difference metric at block 1 04. II D, is greater than a pres et threshold, a change of shot is detected 

i* and recorded at block 10S before proceeding to block 108. Otherwise, it proceeds directly to block 103 to establish a 
new frame. If it is the last frame, the segmentation process is completed; otherwise, it proceeds to block 1 03 to repeat 
the process until the last frame ol the video sequence is reached. The output is a fist of frame numbers anoVor time 
codes of the starting and ending frames of each shot detected from the input video sequence. This method Is only 
suitable for use in detecting sharp transition between camera shots in a video or film such as lhat depicted in FIG. 1 B. 

20 The content between shot 11 0 and shot 11 2 is compIeteJy different from one another. 

FIG 2 is a block diagram showing an overview of a video content parsing system of the present Invention. The 
system comprises of a computer 202, containing Program for Interface block 210. Data Processing Modules, block 212 
"for carrying out the preferred embodiments of the present invention, and block 214for loggingthe information associated 
with segment boundaries and key frames. Connected to the computer are: User Interface Devices 204, an optional 

& Secondary Data Storage Devices 206 and an input Video Source 208. The input video data can be either analog or 
digital, compressed or uncompressed and can be on any type of storage medium. This system can also receive any 
type of vldeo/TV/film standard. 



A. First Embodiment of Invention 

30 

FIG. 3A shows a flow chart for performing temporal segmentation capable of both detecting sharp cuts and gradual 
transitions implemented by special effects of the first embodiment of the present invention. This embodiment is different 
tram the above described video segmentation algorithm of FIG. 1 A. It is detects sophisticated transition techniques 
including dissolve, wipe, fade-in, and fadeout by using a novel twin-comparison method to find the starting and ending 

35 points of the transitians. Prior art by Tonomura employed a h istogram difference metric to calcu late the difference value 
between frames and to detect a scene change whenever the result is greater than a specified threshold. In contrast, 
the present invention does not limit itself by the use of any particular difference metric and a single specified threshold 
The novel twin -compar ison method uses more than one thresholds to detect both sharp camera breaks and gradual 
transitions. It introduces another deference comparison namery accumulated difference comparison as depicted In FIG. 

40 3A. 

Refering again to FIG. 3A, after initialization of system parameters at block 302, and loaded in frame / (current 
frame) at block 304, the detection process begins to compare previous frame iS and current frame /by calculating the 
difference, D k based on a selected difference metric at block 306. If D, is greater than a preselected shot b reak threshold, 
T b , and 7>ans is false (that is, not in a transition period yet), then, it proceeds to block 30B; otherwise, it goes to block 

& 314. At block 308, if X^> frame count In a shot, is not greater than minimum number of frame for a shot, then it 
proceeds to block 315 where £ F is incremented by S, a temporal skip factor between two frames being compared in the 
detection before continuing to process the remaining frames in the Video sequence at block 336; otherwise a cut is 
declared at point P1 and a shot starting at frame F s and ending at frame F 9 is recorded at block 31 0 follow by reinitial- 
ization of the start of a new frame, reset frame count, S P to zero and set Trans to false at block 31 2 before proceeding 

so to block 936. At block 314, D f is checked against a larger threshold cc7i, where a is a user tunable parameter and is 
greater than 1 . If D s is greater and Trans is true then a comfirmed cut is declared at point Pi: otherwise, it proceeds to 
block 316. Here, if Trans is teund to be false, that is. not in a transition period, 0/ is compared against a predefined 
transition break threshold, T„ at block 330. If D f is greater than T t and the number of frame count in a shot. 2 P is greater 
than the minimum number of frame for a shot rV w at block 332, a potential transition is found at P2 and duely Trans 

55 js set to true and other relevant parameters are updated at block 334 before proceeding to block 335. Otherwise, it goes 
to block 355 where S^ie incremented by S before continuing to process the remaining frames in the video sequence. 
However, if. at block 31 6, Trans is true, that is. it Is in a transition period, Dj Is further compared against T t If it is not 
lesser, then, X F (frame count in a shot), X« (accumulative differences in a transition sequence) and (frame count in 
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a transition process) are updated at block 31 7 before continuing to process the remaining frames in the video sequence; 
otherwise, it proceeds to calculate the difference between current frame / and start frame of the potential transition. 
F that is" between image feature of current frame, f u and image feature of the potential transition frame, At block 
316 if D a is not greater than T b or S^/L^is not greater then YT r (where y 3 1) it goes to block 320; otherwise, the end 

* of atransition has beerr successfully detected at point P3. Accordingly, a transition starting at F^F p and ending at F 0 
is declared and recorded at block 322 before reinitializing relevant parameters at blocks 324 and 326 and then continues 
to process the next frame in the video sequence. At block 320. if the number of failed frame count for the period of 
transition, 2^. is not less than the maximum allowable number of fails in a transition. N tjrima3C then the current transition 
is found to be falsed and deemed failed at block 328 at point P4\ otherwise. Z, s and x^are updated, that is. still in 

10 transition period, before proceeding to block 336. Here, the position ol the current frame is incremented by a predefined 
skip factor, S, the start of the next frame for processing; and if it has not reached the end of the frame in the sequence, 
then repeat the process from block 304. 

The flow chart as depicted in FIG. 3A is capable of detecting gradual transitions such as a dissolve as shown in 
FIG. 3B. The original scene 340 is slowfy being superimposed by another scene at 342. 344 and 346 and finally fully 

is dominated by that new scene 348. Eff ectivey, the task of the algorithm as described above is to detect these first and 
last f ramesof the transition sequences in a video sequence. Fl G. 3C depicts an example of the frame-to-frame difference 
values, shewing high peaks corresponding to sharp cuts at 350 and 352; and a sequence of medium peaks correspond- 
ing to a typical dissolve sequence 354. 

A single-pass approach depicted in FIG. 3Ahas disadvantage of not exploiting any information otherthan the thresn- 

20 old values, thus, this approach depends heavily on the selection of those values. Afso. the processing speed is slow. A 
straightforward approach to reducing processing time Is to lower the resolution of the comparison, that is, examining 
only a subset of the total number of pixels in each frame. However, this is clearly risky, since if the subset is too small, 
the loss of spatial detail (if Examining in spatial domain) may result in a failure to detect certain segment boundaries. A 
further improvement could be achieved by employ ing a novel multiple-pass approach to provide both high speed process- 
es ing, the amount of improvement depends on the ei2e of the skip factor and the number of passes, and accuracy of the 
same order in detecting segment boundaries. In the first pass, resolution is sacrified temporally to detect potential seg- 
ment boundaries with high speed. That is, a "skip factor". S, in the video segmentation process is introduced. The larger 
the skip factor, the lower the resolution. (Note that this skip factor is in general larger than the on e used in the description 
of the above and below flow charts.) For instance, a skip factor of 10 means examining only one out of 1 0 frames from 

so the input video sequence, hence reducing the number of comparisons (and, therefore, the associated processing time) 
bythesamefaetor. Inthis process twin comparison for gradual transitions as described in FIG. 3A Is not applied. Instead, 
a lower value of T b is used; and all frames having a difference larger than T b are detected as potential segment boundary. 
Due to the lower threshold and large skip factor, both camera breaks and gradual transitions, as well as some artifacts 
due to camera or object movement, will be detected. False detections fail underthe threshold will also be admitted, as 

ss long as no real boundaries are missed. In the second pass, all computations are restricted to the vicinity of these potential 
boundaries and the twin -comparison is applied. Increased temporal (and spatial) resolution is used to locate all bound- 
aries (both camera breaks and gradual transitions) more accurately, thus recovers the drawback of the low accuracy of 
locating the potential Shot boundaries resulting from the first pass. Another feature can be implemented in the multi- 
ple-pass method is that there is an option whereby different difference metrics may be applied in different passes to 

*o increase confidence in the results. 

B. Second Embodiment of Invention 

The second embodiment of the present invention pertains to the determination of the threshold values used for 
4S determining segment boundaries, in particular^ resholds the shot breakthreshold, and the transition breakthresh- 
old, It is of utmost importance when selecting these threshold values because K has been shown that they vary from 
one video source to another. A tight" threshold makes it difficult for false transitions to be falsely accepted by the system, 
but at the risk of falsely rejecting true transitions. Conversely, a ■loose" threshold enables transitions to be accepted 
consistently, but at the risk of falsely accepting false transitions. 
so in this invention, the automatic selection of threshold T b is based on statistics of frame-to-frame differences over 

an entire or part of a given video sequence. Assuming that if there is no camera shot change or camera movement In 
a video sequence, the frame-tc-frame difference value can only be due to three sources of noise: noise from digitizing 
the original analog video signal, noise introduced by video production equipment, and noise resulting from the physical 
fact that few objects remain perfectly still. All three sources of noise may be assumed to be random. Th us, th e distrib ution 
55 of frame-to-frame differences can be decomposed into a sum of two parts: the random noises and the differences in- 
troduced by shot cut and gradual transitions. Differences due to noise do not relate to transitions. So T b is defined as 
T b = u. + ac where a is the standard deviation, u. is the mean difference from frame to frame and a is a tunable parameter 
and a > 2. The other threshold, namely, r p is used for detecting gradual transition and Is defined as T T = bT b where b 
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is between 0.1 to 0.5; according to experiments conducted. 
C. Third Embodiment of invention 

FIG 4 depicts the third embodiment of the present invention, an automatic content based keyframe extraction 
method using temporal variation of video content After initializing system parameter* at block 502. the 
between frames hS and i are calculated and accumulated based on a selected difference met no at btockSOS. I ft» 
accumufcted difference. exceeds T* threshold for potential key frame, then, rt seta the Flag to 1. aj paternal key 
frame has been detected, at block SOS and proceeds to block S10. where further verification * made by calculating d» 
the difference between current frame / and last key frame recorded. F^ based on a selected difference metric^ If , at 
block 511 D„ is greater than T„ ihreshold for key frame, then, at block 512, the current frame, F„ is recorded as a 
current key frame and reinitialization of F fc as current frame, z„ to zero, and f„ image feature of previous key frame, as 
current image feature is earned out before repeating the process again from block 504 it the end of the frame of the 
video sequence has not been reached. Otherwise, if. at bbckSH. D a isnot greatenhan T* then, rt proceeds to analyse 

the nrtitxne.^ ^ ^nmd in FIG. 4 is different from prior art. Prior art usee nation anaVsls 

which heavily depends on tracing the posmons and sizes of the objects being ^ 

to extract a key frame. This method is not only too slow but also impractHsal since rt relies on accurate motion field 
detection and complicated image warping. Whereas the present invention extracts key frames purely ^d^ tha 
so temporal variation of the video content as described in FIG. 4. These key frames can then be used in video content 
Indexing and retrieval. 

OPERATION OF THE PREFERRED EMBODIMENTS 

2E FIG 5 shows a flow chart of an operation of the preferred embodiments for a video parser of FIG. 2. After selecting 
the difference metric(s) and therefore the imagef eaturefs) to be used in frame content comparison, by user or by default^ 
as wen as all the required system parameters at block 602. the system loads in a video sequence into the parser . and 
digitize it if it has not been done so. Then, the difference. D e between consecutive video frames is calculated b^ed on 
the selected difference metrics) at block 603. if D ; exceeds the shot break threshold. and Trans Is fefce (that is, not 
in a transition period) and the number of frame count in a shot, z* is greaterihan the minimum preset number of frames 
for a shot W Mlf a cut is declared at pofrt PI and a shot, starting at frame F s and ending at frame F. . is rewrted; and 
the detection process continues to process the following frames ol the video sequence. However, if. at block 612. Z F » 
not areaterthan then, a key frame is recorded at block 614. At block 610. if the conditions are not true and at 
block6ia. the conditions are still not true. then, further check is required at block620. At this check, H Trans is not true 
as and at block 646. D, is not greater than T, then, it proceeds to block 640 where Z* accumulative drfferences after 
previous key frame, Is incremented by D, and proceeds toblock 641. Here, if the accumulative differences « not greater 
than it soes to block 638; otherwise it proceeds to block 650 to calculate D„, the difference between feature f, and 
f" if at block 644 the number of frame count fn a shot, z^ is greater than the minimum number of frame allowable tor 
a'singleshof. N~„ a potential transition isfound at FSand accordingly Trans is settorn/eand other relevant parameters 
40 are updated at block 642; otherwise, it proceeds to block 640. Coming back to block 652, if D a is greater Jhan 9T* a 
key frame is detected at point P5 and appropriateV recorded at block 646; otherwise, it proceeds to block 633 where 
Z P ie incremented by S before continuing to process the remaining frames in the video sequence. However, rf at block 
620 Trans is true, that is. it is in a potential transition phase; and. at block 622, D, is less than transition break threshold. 
T„ a further distinction is achieved by calculating. D $ . the difference between the current feature. t„ and preywusfeature. 
is * at block 624. At block 626, if D a is greater than BD, (where B B 3 or the average of D, between the frarfles n ttie 
potential transition, X«fl tp is greater than yT t (where y * 1 ), then, a transition is confirmed at point P3. A shot starting 
at frame F,and ending at frame F 0 (=«:„>) is recorded at block 628 before reinitialization of the relevant parameters at 
block 630 However, if. at block 626, the conditions are not true and at block 632. the number of failed frame count, Z^, 
is not teesihan the maximum allowable numberot fails in atransition, iv^^ then, a false transition is declared* point 
bo p 4 and accordingly. Trans is set to false before proceeding to block 640. At block 632, however, rf the condition is 
satisfied it proceeds to blocks 636 and 638 where the parameters as adjusted accordingly before it continues to process 
ihe next'frames within the video sequence. At block 622, if the condition is not met. it proceeds to block 621 where 
concerned parameters are adjusted accordingly before proceeding to process the following frames m the video se- 
quence. The output data contains the frame numbers and/or time codes of the starting and ending frames as well as 

ss key frames of each shot 1JU , , . 

While the present invention has been described particularly with reference to FIGS. 1 to 5. it should be understood 
that the figures are for illustration only and should not be taken as limitations on the invention. In addition, it Is clear that 
the methods of the present invention have utility in many applications where analysis of image information is required. 
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it is contemplated thai many changes and modifications may bo mads by one of ordinary skill inthe art without departing 
from the spirit and the scope of the invention as described. 

Claims 

,. Inasystemforpareingaplvral^of^^ 

origin*. «id images being further divided into plurality sequences of frames, a method for selecting at (east one 
key frame representative of a sequence of said images comprising the steps oh 

(a) determining a difference metric or a set ot difference metrics between consecutive image frames, said dif- 
ference metrics having corresponding threehoWs; 

(b) derivingacontent difference (D^, said D : being the ^^^^^'T^^^t 
difference metrics, the interval between said two current image frames bemg adjustable with a slop factor s 
which define the resolution at which said image frames are being analysed; 

(c) accumulating D-, between every two said consecutive frames until the sum thereol exceeds a predetermined 
potential key threshold T R ; 

(d) calcuiatlng a difference D a , said D a being the difference between the ^^J^^S^ 
frame based on said difference metrics, or between the current frame and 

basedalsoon said difference metric if there is no previous key frame, the currentframebecommgthe keyframe 
if D, exceeds a predetermined key frame threshold T d ; and 

(a) continues the steps in (a) to (d) untifthe endframe is reached, whereby key frames for indexing sequences 
of image are identified and captured automatically. 

2 In asystem for parsing a plurality of images in motion without modifying the media In which the 

ortaSE said images being further divided into plurality sequences of frames, a method for segmenting at least 
one sequence of said images into individual camera shots, said method comprising the steps ot 

(a) determining a difference metric or a set of difference metrics between consecutive image frames, said dif- 
ference metrics having corresponding shot break thresholds 

(b) deriving a content difference (D|), said being the difference between two currem ^ge frai^ ^sakl 
Sference metrics, the interval between said two current image frames being adjustable wrth a sk, P factor S 
which define the resolution at which said image frames are being analysed; 

(c) declaring a sharp cut if Dj exceeds said threshold 

(d) detecting the startingframe of a potential transition if said exceeds a transition threshold T, but tees than 
said shot break threshold T b ; 

(e) detects the ending frame of a potentiattransition by verifying the accumulated difference, said accumulated 
difference being based on said selected difference metrics; and 

(1) continues the stepe in (a) to (e) until theendframe is reached, whereby sequence of images having individual 
camera shots are identified and segmented automatically in at least one pass. 

The method of video segmentation as in claim 2 wherein the processing speed of steps 4(a)-4(f) is enhanced with 
a multi-pass method, said multi-pass method comprising at least two steps: 

(a) in a first pass, resolution is temporarily decreased by choosing a substantially larger skip factor 2 j.nd a 
ss tower shot break threshold T b so as to identify rapidly the locations of potential segment boundaries wrthout 

allowing any real boundaries to pass through without being detected: and 

<b) in subsequent passes, resolution is increased and all computation is restricted to the vicinity of said potential 
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seynent boundaries whereby both camera breaks and gradual transitions are further identified 

4. The method for video segmentation as in claim 3 wherein said multi-pass method can employ different difference 
metrics in different passes to increase confidence in the results. 

5. The method for video segmentation as in claim 3 wherein said first pass does not apply steps 4(a)-4(1). 

6. The method for video segmentation as in claim 3 wherein said subsequent passes apply said steps 4<aH(0- 

7. Inasystemforparsingaplural^ 
origin^ said images being further divided into plurality sequences of frames a method ^^^S£J^ 
one sequence of said images into individual camera shots and selecting at least one key frame representee of a 
sequence of said images, said method comprising the steps of: 

(a) determining a difference metric or a set of difference metrics between consecutive image frames, said dif- 
ference metrics having corresponding shot break thresholds T b ; 

(b) deriving a content difference said D, being the difference between two current 

difference metrics, the interval between said two current image frames being adjustable wrth a skip factor s 
so which define the resolution at which said image frames are being analysed; 

(c) declaring a sharp cut if Dj exceeds said threshold T„; 

(d) detecting the starting frame of a potential transition if said Di exceeds a transition threshold T, but less than 
25 said shot break threshold T b ; 

(e>detecting the ending frame of a potential transition by verifyingthe accumulated difference, said accumulated 
difference being based on said selected difference metrics; 

X (f) continues the steps in (a) to (e) until the end frame is reached; 

(9) deriving a content difference (Dj). said D, being the difference between two current image frames based on 
said selected image features and said difference metric, the interval between sa,d two curre^rrage frames 
being adjustable with a skip factors which define the resolution at which sari .mage frames are being analysed; 

55 (h) accumulating D, between every twosaid consecutive frames until the sumthereof exceeds a predetermined 

potential key threshold Tk; 

ffl calculating a difference D a . said D a being the difference between the current frame and the previous key 
frame based on said deference metric, orbetweenthecurrenttrame ^dthefir S tfram e of^^encebas«» 
also on said difference metric if there is no previous key frame, the current frame becommg the keyframe it D a 
exceeds a predetermined keyframe threshold and 

(j) continues the steps in (a) to (i) until the endframe is reached, whereby sequence of 
«s camera shotsare identified and segmented automatically and keyframes for mdexmg sequences of image are 

identified and captured in at least one pass. 

a. ThemethodforvWeosegmentati^^ 

the mean of the frame-to-frame difference u and a multiple a of the standard deviation of the frame-to-f rame drffemce 

bo a. 

9. The method tor video segmentation as in claim 8 wherein said multiple a have a value between 5 and 6 when the 
difference metricis a hietrogram comparison. 

ss 10. The method for video segmentation as in claims 2 and 7 wherein said transition threshold T t comprises a sum of a 
multiple b of said shot break threshold T b . 



8 

PAGE 28/87 ' RCVD AT 4/17/2006 4:21:12 PM (Eastern Daylight Time] * SVR:USPT0-EFXRF-1/1 6 • DNIS:2738300 * CSID:6508575487 * DURATION (mm-ss):24-56 



APR. 17. 2006 1:13PM HP LEGAL 



NO. 322 P. 29 



EPO 690413 A2 



f Start ) 



Load in a video 
sequence, and select toe 
frame *izc and temporal 
skip factor S 



I 



Current Frame F t = Pint frame 
of the video 



3 



M01 



■102 



Load in frame Fj 



103 



i 



Calculate ibe difference, D| , 
between frames F f and F f+ $ 
based on a selected difference 
metric 



104 




•108 




FIG.1A 

(Prior Art) 



9 



PAGE 29/87 * RCVD AT 4/17/2006 4:21:12 PM [Eastern Daylight Time] ■ SVR:USPT0-EFXRF-1/16 > DNIS:2738300 ■ CSID:6508575487 1 DURATION (mm-ss):24-56 



APR. 1 7. 2006 1:1 3PM 



HP LEGAL 



NO. 322 P. 30 



EP0 690 413 A2 




FIG.1B 



Adoo anavuvAV isaa 



10 



PAGE 30/87 ' RCVDAT 4/17/2006 4:21:12 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF-1/16* DNIS:2738300 * CSID:6508575487 * DURATION (mm-ss):24-56 



APR. 17. 2006 1:13PM HP LEGAL 



EP0 690 413 A2 



NO. 322 P. 31 



204 



User 
Interface 
Devices 



208 




206 



Secondary 
Data Storage 
Devices 



H 

a 

o 

u 



202 



Program 

For 
Interface 



Data 
Processing 
Modules 



Log Of Camera Shot 
-segment boundaries / 
-key frames 



210 



/ 



212 



214 



FIG. 2 



11 



PAGE 31/87 1 RCVD AT 4117/2006 4:21:12 PM [Eastern Daylight Time] * SVfcUSPTMFXRM/16 * DNIS:2738300 * CS!D:6508575487 * DURATION (mm-s$):24-56 



APR. 17. 2006 1:13PM HP LEGAL 



NO, 322 P. 32 



EP0 690 413 A2 



302 T _ 



Initialization of System parameters: 
image features, difference metrics; 

T t > r f ;<x r; 

Trans = FALSE; 
i =; Start frame. 
Load in video frame U 
Record i &F s of ihe first Shot ; 



304 



Load in frame i 



306"^ 



Zl 



Calculate lie difference, Pj. 
between frames tS and A based on 
a selected difference metric 



Note: 

5: temporal skip-interval between two frames 
being compared in the detection ; 
Sfi frame size; 

Ct, yt user tunable paramicrs, tr^l.O; 
T b : shot break threshold 

transition break threshold 
k curreni frame; i S: last frame; 
F s z start frame of a shot; 
Fgi end frame of a shot: 
F p z sttrt frame of a potential transition; 
Trans: flag to indicate whether in a transition; 
f£ image feature of frame i ; 
j^! image feature of frame Fp ; 
Of difference between frames iS and ft 
Dp difference between frame F, and Fp\ 
£p: frame count in a shot; 
X m : failed frame count in transition detection: 
Wpnfi,- minimum number of frames for a shot; 
£ f f: frame count b a transition process; 
Xjj,: accumulative differences in a transition; 
Nh«~mt* Maximum of allowed fails in a 
trasilion. 




Declare a cut and record 
Start: F s ; Snd:F g 

i 



Set the start of a new shoe . 
Xf = ^; Trans=FALSE 



X 



(12 



© 



FIG,3A 



12 



PAGE 32/87 ! RCVDAT 4/17/2006 4:21:12 PM [Eastern Daylight Time]* SVR:USPT0-EFXRF-1/16 • DNIS:2738300 * CSID:6508575487 * DURATION (mm-ss):24-66 



APR. 17. 2006 1:1 3PM HP LEGAL 



NO. 322 P. 33 



EP0690 413A2 



© 



317 




330 



Calculate difference E> a 
Ijetween frame i and F„ 




~=n 



P4> 




Declare a transition, 
record a shoe 
Start: F$ End: Fg=F p 



334 



Trans = TRUE; 
2^=0; 2^0;. 

z? — 



Declare the potential 
transition Tails; 



323 



7 



Set the start of anew 
ShQt:F s =i'> 



335 



T 



324 



326 



Trans = FALSE. 




(" End ^ 

FIG.3A (continued) 



13 



PAGE 33/87 * RCVD AT 4/17/2006 4:21:12 PM [Eastern Daylight Time] * SVR:USPT0-EFXRF-1/16 ■ DNIS:2738300 ' CSID:6508575487 ' DURATION (mm-ss):24-56 



APR. 17. 2006 1:14PM HP LEGAL 



NO. 322 P. 34 



EP 0 690 413 A2 




F1G.3B 



14 



PAGE 34/87 * RCVDAT 4/17/2006 4:21:12 PM [Eastern Daylight rime] ' SVR:USPT0-EFXRF-1/16* DNIS:2738300 * CSID:6508575487 * DURATION (mm-ss):24-56 

BEST AVAILABLE COPY 



APR. 1 7. 2006 1:14PM HP LEGAL 



EP 0 690 413 A2 



NO. 322 P. 35 




Frame Number 



FIG.3C 



15 



PAGE 35/87 * RCVD AT 4/17/2006 4:21 :12 PM [Eastern Daylight Time] * SVR:USPT0-EFXRF-1/16 * DNIS:2738300 * CSID:6508575487 * DURATION (mm-ss):24-56 



APR. 17. 2006 1:1 4PM HP LEGAL 



NO. 322 P. 36 



EP0690 413 A2 



502 

*<4 



f Start } 

Initialization of system parameters: 

i - Start Rrame; = i; 
Load m video frame i; 



.504 



Load iit frame i 




Calculate the difference, Dp 
between frames i-S and i, based od 
a selected difference metric; 



5C9 




Flagyl 



510 



Cat julaie the difference 
between frames i and P^^ascd en 
a selected difference metric 




Record F t as a key frame; F^ = i; 

Ha* =0- 




Note: 

S^l frame size* 

£ temporal skip factor ; 

threshold for potenlail key frame; 

threshold for key frame; 
iz enrrene frame; 

lasr frame; 
F£ last key frame recorded; 
fl'. image feature of frame i; 

Image feats rfc of previous key frame; 
£>;: difference between frames i -S and v 

cfifferencc between frames F% sad i; 
Xfc. - accumulative differences after 

previous key frame; 



FIG.4 



16 



PAGE 3W87 * RCVD AT 4/17/2006 4:21:12 PM [Eastern Daylight Time] * SVR:USPT0-EFXRF-1/16* DNIS:2738300 * CSiD:6508575487 * DURATION <mm-ss):24-56 



APR. 1 7. 2006 1:14PM HP LEGAL 



NO. 322 P. 37 



EP0690 413A2 



602 



( Stari } 



Initialization of system parameters: 
S fr S. 

image features, difference metrics; 

2p=-0; lfc=0; 
Trans = FALSE; 
t = Start frajpc- 
Loatf in video frame v 
Record* asF^of ibe tat shot ; 















Calculate the difference, D ir 
between frames f-5 and U based on 
a selected difference metric 




Declare a cut and record a skat: 
Start F$; End:/=£=r\ 



I 



SetancwahOU F$ m *' 




= 0. 


> 


r« 







© 0 



Note: 

5: temporal sldp-imcrvalr 
Sft frame size; 

a; 0, y, ft user tunable paramters. 
Ttf shot break threshold; 

transition, break threshold 
F g start frame of a shot; 
Fgi end frame Of a shot; 
k current frame; i -S: Last frame; 
F»: start frame of a potcn tail irnation; 
Trans: flag to indicate whether is potential 
transition process uOw; 
f. : image feature of frame i; 
f^. image feature of a porentail transition or the 
last key frame 

difference between frames i-Sandi; 

V a i difference between feature and/p; 
Zfi frame count in a shot; 
Zip frame count in a transition; 
J^rfailcd frame count in a transition; 
2 lc £ accumulative differences in a transition: 
accumulative differences after 

last key frame; 
rYjmur minimum number of frames for a shot; 
/Y 07W7liVt : nuwimum of allowed rails in a 

transition. 



$14 



Record a key frame: 



FIG.5 



17 

PAGE 37/87 * RCVD AT 4117/2006 4:21:12 PM [Eastern Daylight Time] « SVR:USPT0-EFXRF-1/16 ' DNIS:273830Q * CSIO:6508575487 * DURATION (mm-ss):24-56 



APR. 1 7. 2006 1:14PM HP LEGAL NO. 322 P. 38 



EP 0 690 413 A2 



Yes 




545 



624 



Calcuiaic difference JD a 





Declare the potential 
transition faUs; 
Trans -FALSE 



P4 



Declare <he transition ends; 
Hecord a shot: : 
Start frame: 
End frame; F§^i 



I 



/I 

636 



-630 



I 



Set tbestaitofanewsboc 
Trans = FALSE. 



638 



642 



Trans = TRUE; 




648 



Record a key frame: 
fir*' 




F1G.5 (continued) 



18 



PAGE 38/87 4 RCVD AT 4/17/2006 4:21:12 PM [Eastern Daylight Time] * SVR:USPT0-EFXRF-1/16* DNIS:2738300 1 CSID:6508575487 * DURATION (mm-ss):24-56 



APR. 17. 2006 1:14PM HP LEGAL 



NO. 322 P. 39 



(19) 



CO 

< 

CO 

o 
o> 

CO 

o 

Dl 
LL2 



J 



Europaiachec Paientamt 
European Patent Office 
OffiOO europeen des brevets 




(12) 



(88) Date of publication A3: 

31,07.1996 Bulletin 1996/31 



(n) EP 0 690 413 A3 

EUROPEAN PATENT APPLICATION 

(51) Int OA G06T 7/20, G1 1 B 27/10, 
G06F 17/30 



(43) Date ol publication A2: 

03-01.1996 Bulletin 1996/01 

(21) Application number: 95304387.4 

(22) Date Ol filing: 22.06.1995 



(84) Designated Contracting States: 
DE GB 

(30) Priority: 27.06.1994 US 266216 

(71) Applicant: Institute of Systems Science 
Kent Ridge, Singapore 0511 (SG) 

(72) Inventors: 

• Zhang, Hong Jiang 
Singapore 0410 (SG) 



• Smollar, Stephen William FX Palo Alto 
Palo Alto, California 94004 (US) 

• Wu, Jian Hua 
Singapore 1027 (SG) 

(74) Representative; Driver, Virginia Rozanne et al 
Page White & Farrer 
54 Doughty Street 
London WC1N2LS(G&) 



Ine 



(54) 



A system for locating automatically video segment boundaries and for extraction of 
key-frames 



(57) The present invention describes an automatic 
video content parser for parsing video shots such that 
they are represented in their native media and retneva- 
ble based on their visual contents. This system provides 
methods for temporal segmentation of video sequences 
into individual camera shots using a novel twin-compar- 
ison method. The method is capable of detecting both 



camera shots implemented by sharp break and gradual 
transitions implemented by special editing techniques, 
including dissolve, wipe, fade-in and fade-out; and con- 
tent based key frame selection of individual shots by an- 
alysing the temporal variation of video content and se- 
lect a key frame once the difference of content between 
the current frame and a preceding selected key frame 
exceeds a set Of preselected thresholds. 



202 



204 



User 
Interface 
Devices 




206 



V 



Secondary 
Data Storage 
Devices 





OS 
Ed 

P 


Program 

For 4 
I&terface 




Data 




CL, 

S 


Processing 
Modules * 




o 


Log Of Camera Shot 




o 


-segment boundaries 
-key frames 4 



/ 



210 



212 



214 



FIG- 2 



Pltoed by J5uv». TS00 1 fl»«S (FA) 



PAGE 39/87 ' RCVD AT 4/1712006 4:21:12 PM (Eastern Daylight Time] * SVR:USPT0-EFXRF-1/1 6 * DNIS:2738300 * CSID:6508575487 * DURATION (mm-ss):24-56 



APR. 17. 2006 1:15PM HP LEGAL 

EP 0690413 A3 



NO. 322 P. 40 



European Palcnl 
Office 



EUROPEAN SEARCH REPORT 



EP 95 3D 4387 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Citation 6f*0«me** 



MULTIMEDIA SYSTEMS, 
vol. 1, no. 1, 1993, 
pages 10-28, XPG6057249S m m 

ZHANG K. J. & AL: 'Automatic partitioning 
of full -motion video" 

* page 12, rigllt-hand column, paragraph 2 

- page 15, right-hand column, paragraph 
3 2 * 

*'page 17 F left-hand column, paragraph 4 - 
page 19, right-hand column, paragraph 5 * 

* page 26.' right-hand column, paragraph 6 

- page 27, right-hand column, paragraph 
6.2 * 

EP-^-e 555 873 (INTEL CORP) 16 August 1993 1 

* the whole document * 



Rd«r; 
to 



1-lft 



CLASSIFICATION 0¥ THE 
AmiCATTQN (lntX16) 



Gfi6T7/2Q 

G11B27/W 

GQ5F17/30 



•10 



teAKb report fcrf toes dmp «p far aD d«ii« 



THE HAGUE 



4 June 1995 



TECHNICAL FIELDS 
SEARCHED auCl-0 



606F 
G06T 



Foamier* C 



CATECOKV OF OTCD DOCUMENTS 

Y ; ramcyUrty rcicraMT rf CO*«ttn*A ^ Wiethe* 
docO(ta«nr of tfle satsv CW 1 ! 

O : DOfwrilltt fcADcure 



£ : ariiv vxai iocaartt, but pofalbbcd co, pi 



T 

fee filing 4U» 
L *d for «fctf 



A : robs rf the cut pita* corr«poofi»e 



2 

PAGE 40/87 * RCVD AT 4/1712006 4:21:12 PM [Eastern Daylight Time] 1 SVR:USPT0-EFXRF-1/16* DNIS:2738300 1 CSID:6508575487 * DURATION (mm-ss):24-56 



